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METHODS AND APPARATUS FOR ADAPTIVE SIGNAL PROCESSING 
INVOLVING A KARHUNEN-LOEVE BASIS 

5 Field of the Invention 

The present invention relates generally to signal processing techniques and devices, and more 
particularly to signal processing techniques and devices which involve the utilization of a Karhunen- 
Loeve (KL) basis. 

10 Background of the Invention 

Many areas of signal processing utilize so-called eigendecompositions of covariance 
matrices. Such decompositions generally provide a basis, commonly known as a Karhunen-Loeve 
(KL) basis, in which signal expansion coefficients are uncorrelated. The KL basis or a signal 
represented in the KL basis can immediately reveal crucial information in many applications. For 

1 5 example, in an array signal processing application, the KL basis is used to estimate directions of 
arrival of plane waves. This application allows the received signal to be separated into a signal 
subspace and a noise subspace, and is therefore called a subspace method. The signal space often 
changes with time, necessitating the use of subspace tracking techniques. 

Transform coding is another area in which it is useful to track a KL basis. In transform 

20 coding, a KL basis gives the best representation for encoding a Gaussian vector with scalar 
quantization and scalar entropy coding. When a KL basis is not known in advance or varies slowly, 
adaptive estimation or tracking can enhance performance. 

Gradient methods for adaptive signal-subspace andnoise-subspace estimation are described 
in J.-F. Yang et al. ? "Adaptive Eigensubspace Algorithms for Direction or Frequency Estimation and 

25 Tracking," IEEE Trans. Acoust. Speech Signal Proa, Vol. 36, No. 2, pp. 241-251, February 1988. 
However, these methods are complicated both numerically and theoretically by orthonormalization 
steps. Adaptation of a Givens parameterized transform, which eliminates the need for explicit 
orthonormalization, was suggested as a method for finding a Karhunen-Loeve transform in PA. 
Regalia, "An Adaptive Unit Norm Filter With Applications to Signal Analysis and Karhunen-Loeve 

30 Transformations," IEEE Trans. Circuits and Systems, Vol. 37, No. 5, pp. 646-649, May 1990. 
However, this approach fails to provide suitable convergence results. Subsequent work either does 
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not address step size selection, e.g., P.A. Regalia et al., "Rational Subspace Estimation Using 
Adaptive Lossless Filters IEEE Trans. Signal Proc, Vol. 40, No. 1 0, pp. 2392-2405, October 1 992, 
or considers only the most rigorous form of convergence in which step sizes must shrink to zero, 
gradually but not too quickly, e.g., J. -P. Delmas, "Performances Analysis of Parameterized Adaptive 
5 Eigensubspace Algorithms," Proc. IEEE Int. Conf. Acoust, Speech and Signal Proc, Detroit, MI, 
pp. 2056-2059, May 1995, J.-P. Delmas, "Adaptive Harmonic Jammer Canceler," IEEE Trans. 
Signal Proc, Vol. 43, No. 10, pp. 2323-2331, October 1995, and J.-P. Delmas, "Performances 
Analysis of a Givens Parameterized Adaptive Eigenspace Algorithm," Signal Proc, Vol. 68, No. 1, 
pp. 87-105, July 1998. The latter form of adaptation, with step sizes approaching zero, is generally 

1 0 not suitable for use in practical applications. 

For details on other conventional basis tracking techniques, see J.-F. Yang et al., "Adaptive 
High-Resolution Algorithms For Tracking Nonstationery Sources Without the Estimation of Source 
Number," IEEE Trans. Signal Proc, Vol. 42, pp. 563-571, March 1994, P.A. Regalia, "An Unbiased 
Equation Error Identifier and Reduced-Order Approximations," IEEE Trans. Signal Proc, Vol. 42, 

15 No. 6, pp. 1397-1412, June 1994, B. Champagne, "Adaptive Eigendecomposition of Data 
Covariance Matrices Based on First-Order Perturbations," IEEE Trans. Signal Proc, Vol. 42, No. 
10, pp. 2758-2770, October 1994, B. Yang, "Projection Approximation Subspace Tracking," IEEE 
Trans. Signal Proc, Vol. 43, No. 1, pp. 95-107, January 1995, and B. Champagne et al., "Plane 
Rotation-Based EVD Updating Schemes For Efficient Subspace Tracking," IEEE Trans. Signal 

20 Proc, Vol. 46, No. 7, pp. 1886-1900, July 1998. 

The above-identified conventional techniques for tracking a KL basis fail to provide adequate 
performance, such as local convergence within specified step size bounds. In addition to exhibiting 
a lack of suitable convergence guarantees, most conventional techniques are computationally 
complicated and not highly parallelizable. A need therefore exists for improved techniques for 

25 tracking a KL basis. 
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Summary of the Invention 

The present invention provides improved signal processing techniques and devices for 
Karhunen-Loeve (KL) basis tracking through the use of stochastic approximations to specified 
gradient descents. In order to keep computational requirements low and to provide a degree of 
5 parallelizability, the signal processing techniques and devices of the present invention utilize 
computations which have locality and regularity properties. 

In accordance with the invention, a transform is represented in a reduced-parameter form, 
such as a Givens parameterized form or a Householder form, such that the reduced-parameter form 
for an A^x N transform comprises fewer than AT 2 parameters. An updating process for the transform 
10 is implemented using computations involving the reduced-parameter form, and an adaptation of the 
transform is represented directly as one or more changes in the reduced-parameter form. 

An illustrative embodiment of the invention provides a gradient descent algorithm 
particularly well suited for transform adaptation in transform coding applications. The algorithm 
can be implemented in one or more signal processing devices, such as a digital signal processor 
15 (DSP), filter, encoder, decoder, etc. The algorithm in the illustrative embodiment uses a pairwise 
energy compaction property of the KL transform. Advantageously, the algorithm converges locally 
in mean within specified step size bounds. In addition, the algorithm allows trade-offs to be made 
between speed of convergence and steady-state error. It is particularly simple to implement and 
exhibits a rate of convergence which is generally better than that of more computationally 
20 demanding conventional approaches. 

The invention provides a number of other advantages over conventional techniques. For 
example, the invention avoids the need for an explicit eigendecomposition operation in 
implementing the transform, and does not require explicit estimation of the autocorrelation matrix 
of the source signal being processed. In addition, the invention provides better parallelizability of 
25 computations than the previously-noted conventional techniques, and allows the transform update 
process to be driven by quantized data. 

The algorithm in the illustrative embodiment preferably operates on Givens parameterized 
matrices, i.e., matrices which are a product of multiple Givens rotations, so as to ensure 
orthogonality, increase numerical robustness, and minimize the number of parameters. 

3 
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Although particularly well suited for use in transform coding applications, the invention can 
also be applied to other types of signal processing applications, including array signal processing, 
general-purpose signal filtering, or other applications involving a KL basis. Moreover, the 
techniques of the invention are applicable to a wide variety of different types of signals, including 
data, speech, audio, images, video and other types of signals. 

Brief Description of the Drawings 

FIG. 1 is a schematic representation of a signal processing device which implements a Givens 
parameterized form of an orthogonal transform in accordance with the invention. 

FIG. 2 is a schematic representation of a signal processing device which implements a 
stochastic approximation of a gradient descent algorithm in a Givens parameterized form in 
accordance with the invention. 

FIGS. 3A and 3B show plots of deterministic simulations which demonstrate differences in 
rates of mean convergence for gradient descent algorithms as applied to example source signals. 

FIGS. 4A, 4B and 4C show plots of stochastic simulations for gradient descent algorithms. 

FIG. 5 shows a schematic representation of a signal processing device for backward adaptive 
transform coding in accordance with the invention. 

FIG. 6 shows plots of simulations of a backward adaptive version of a gradient descent 
algorithm in accordance with the invention. 

Detailed Description of the Invention 

The invention will be illustrated below in conjunction with exemplary signal processing 
techniques and devices. The techniques and devices described may be applied to processing of a 
wide variety of different types of signals, including data signals, speech signals, audio signals, image 
signals, and video signals, in either compressed or uncompressed formats. The term "vector" as used 
herein is intended to include any grouping of coefficients or other components representative of at 
least a portion of a signal. The term "signal processing device" is intended to include any type of 
device suitable for providing at least a subset of the processing operations of the present invention, 
such as a digital signal processor (DSP), microprocessor, computer, application-specific integrated 
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circuit (ASIC), filter, encoder, decoder, etc. The particular device implementation used in a given 
application will of course be a function of the particular requirements of that application. 

As described previously, it is often desirable in signal processing applications to update a 
Karhunen-Loeve (KL) basis estimate as data vectors are processed. An illustrative embodiment of 
the invention to be described below provides a Givens parameterized gradient descent algorithm for 
updating aKL basis estimate. This embodiment, which is based onpairwise maximization of coding 
gain, is contrasted herein to two conventional algorithms based on stochastic gradient descents of 
global cost functions. Although all three of these algorithms can be shown to converge locally in 
mean, the algorithm of the illustrative embodiment of the invention converges much faster than the 
other two while also being computationally the simplest of the three. It should be emphasized, 
however, that the illustrative algorithm is presented by way of example, and should not be construed 
as limiting the scope of the invention. 

The three algorithms to be described herein are also referred to as Algorithm 1, Algorithm 
2 and Algorithm 3. It should be understood that Algorithms 1 and 2 are the conventional algorithms 
based on stochastic gradient descents of global cost functions, and are described for purposes of 
comparison with Algorithm 3, which is the illustrative embodiment of the invention based on 
pairwise maximization of coding gain. 

The approach utilized in Algorithms 1 , 2 and 3 maybe analogized in some respects to Wiener 
filtering, as described in, e.g., P.M. Clarkson, "Optimal and Adaptive Signal Processing," CRC 
Press, Boca Raton, FL, 1993, which is incorporated by reference herein. For optimal 
finite-impulse-response Wiener filtering, one requires knowledge of second-order moments of 
signals and one needs to solve a linear system of equations, which in some contexts may be 
considered expensive. The linear system of equations is commonly replaced with a gradient descent 
and the exact moments are replaced with immediate stochastic estimates, leading to the well-known 
least-mean-squares (LMS) algorithm. By analogy, determining the optimal transform for transform 
coding requires the determination of an eigendecomposition, typically an expensive computation. 
Algorithms 1 , 2 and 3 replace this computation with a gradient descent. It should also be noted that 
both the LMS algorithm and the gradient descent algorithms can be driven by quantized data. 
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The following notation will be used in describing Algorithms 1, 2 and 3. Let {x n } neN be a 
sequence of Revalued random vectors and let X n = E [x n x T n ]. It is assumed for simplicity and 
clarity of illustration that the dependence ofX n on n is mild. The algorithms are designed to produce 
a sequence of orthogonal transforms T„ such that T„XJ T n is approximately diagonal for each n. The 

algorithms are preferably strictly causal, i.e., T n should depend only on [x k }^ . In general, X n is 

not known, but instead must be inferred or otherwise estimated from fx, V' 1 

i k n=o' 

An orthogonal T n such that T n X n T T n is diagonal is called a Karhunen-Loeve transform (KLT) 

of the source vector x„. Finding a KLT is equivalent to finding an orthonormal set of eigenvectors 
of the symmetric, positive semidefinite covariance matrix X n \ i.e., solving a so-called symmetric 
eigenproblem, as described in G.H. Golub et al., "Matrix Computations," Johns Hopkins Univ. 
Press, Baltimore, MD, Second Edition, 1989, which is incorporated by reference herein. Thus, 
finding an algorithm for transform adaptation may be viewed as an attempt to approximately solve 
a sequence of symmetric eigenvalue problems with noisy data. 

The present invention avoids the need for an explicit eigendecomposition by using incoming 
data samples to make small adjustments in the transform so that changes in the KL basis are tracked 
and any initial misadjustment is eliminated. Algorithms 1 and 2 make the transform adjustments in 
the direction of the negative gradient of a cost function that is minimized by the desired 
diagonalizing transform. Examples of such cost functions and corresponding gradient descent 
analysis with respect to these functions will be given below. Algorithm 3 applies a gradient descent 
to an energy compaction property that holds for any pair of transform coefficients, instead of 
minimizing a single global cost function through gradient descent. 

Certain additional notation used in the description of Algorithms 1, 2 and 3 will now be 
introduced. A Givens rotation G of ^radians counterclockwise in the (ij) coordinate plane is an 
identity matrix except for 
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This rotation will be denoted G lJ0 . A Givens rotation is also sometimes referred to as a Jacobi 
rotation. 

Any NxN orthogonal matrix can be factored as a product of K = N(N- 1 )/2 Givens rotations 
with / <j and each rotation angle 0e [-tt/4, tc/4). See, for example, FJ. Vanpoucke et al, "Factored 
Orthogonal Transformations For Recursive Eigendecomposition IEEE Trans. Circuits Syst.-II: 
Anal. Dig. Sig. Proa, Vol. 44, No. 3, pp. 253-256, March 1997, which is incorporated by reference 
herein. 

Givens parameterizations of orthogonal matrices will be used extensively in the description, 
so the following notation that eliminates one subscript will be used. To index only the needed (/, 
j) pairs, map (ij) to k, where (ij) is the Mi entry in a lexicographical list of (ij) e{l,2, . . . , N} 2 
pairs with / <j. The reverse of this index mapping is denoted by (i k J k ) being the pair corresponding 
to index k. A Givens parameterized representation is given by 

where 0 = {0 X , 0 2 , . . . , 0 K ) and the indices have been remapped according to G k Q = G Jk 0 . 

FIG. 1 shows a signal processing device which implements the above-described Givens 
parameterized form of an orthogonal transform with the index remapping for a case of 4. Each 
G k block in the figure represents a Givens rotation matrix. The Givens parameterized form ensures 
orthogonality and minimizes the number of parameters. Matrices in this form can be multiplied 
using conventional techniques such as those described in, e.g., the above-cited F.J. Vanpoucke et al. 
reference. This is an important aspect of the present invention, as it produces a multiplicative update 
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matrix in Givens parameterized form, i.e., an orthogonal matrix U a in Givens form is generated 
such that T n+l = U &n -T n 

Global Gradient Descent Algorithms (Algorithms 1 and 2) 

This section will consider diagonalizing a fixed matrix X, so its time index is not shown. The 
problem of finding a KLT T corresponding to a co variance matrix Xcan be cast as a minimization 
problem if there is a cost function that is minimized if and only if T is a KLT. Many such cost 
functions exist, and two are described in greater detail below, along with corresponding analyses of 
deterministic gradient descents of these cost functions. These analyses lead directly to 
convergence-in-mean results for the stochastic transform update algorithms, i.e., Algorithms 1 and 
2. 

One type of example cost function is the squared 2-norm of the off-diagonal elements of Y 
= TXT T : 

N N 

m-i Ik- 

This cost function attains its minimum value of zero if and only if T exactly diagonalizes X. 
Another example of a cost function is 

which, although less clearly connected to the diagonalization of X, is also minimum when T 
diagonalizes X. This can be seen as follows. As T varies over the set of orthogonal matrices, the 
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diagonal elements of 7 lie in the convex hull of the permutations of the eigenvalues of X 9 as 
described in A.W. Marshall et al., "Inequalities: Theory of Majorizations and Its Applications," 
Academic Press, San Diego, CA, 1979, which is incorporated by reference herein. Furthermore, the 
sum of the diagonal elements of 7 is constant. The minimum of the product occurs when the 
elements are as far apart as possible, i.e., at a corner of the convex set. Since the corners are 
permutations of the eigenvalues, J 2 is minimized when T is a diagonalizing transform. 

In high resolution transform coding with optimal bit allocation, the mean-squared error 



(MSE) distortion per component at rate R bits per component is proportional to ^J 2 (Y)2 2r . In 




this context, it is shown in A. Gersho et al., "Vector Quantization and Signal Compression," Kluwer 
Acad. Pub., Boston, MA, 1 992, which is incorporated by reference herein, that J 2 is minimized when 
TisaKLT. 

Both of the above-described cost functions J x and J 2 are continuous in each component of 7. 
It should be noted that many other qualitatively different cost functions may be used, including, e.g., 
non-negative-coefficient linear combinations of the cost functions J x and J 2 , 

Given a cost function J(Y), defined on symmetric positive semidefmite matrices, that is 
minimized if and only if 7 is diagonal, one skilled in the art of adaptive filtering could approach the 
design of an adaptive KL transform as follows. First, compute the gradient 




V T J(TXT T ), 



and then update according to 




T n -^ T J(TXT T ) 



where \l is a small, positive step size. The initial difficulty with such an approach is that the 
orthogonality of T n does not imply the orthogonality of T n+X . Furthermore, the gradient will generally 
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be dense, making computations difficult. Using a descent over the Givens parameterized form of 
the transform alleviates the first difficulty, but not the second. 

For clarity of illustration, deterministic descents using the cost functions J x and J 2 will now 
be described. Using the Givens parameterized representation of the transform matrix, finding a 

diagonalizing transform amounts to minimizing^! or J 2 over © e [-% /4,% I . A straightforward 

gradient descent would update the parameter vector through 



0=0„ > 



(1) 



with step size // e M + . This approach can work, but is not computationally attractive because the 
gradient is dense, i.e., each component of VJ t depends on every component of 0. As an alternative, 

gradient descent can be used to compute a multiplicative update matrix. With Y n = T n XT T n , one can 
update by 



T = U T 



(2) 



where 



0„ = -^V e J ( (f/ 0 7X)|e-c 



(3) 



with step size n e R + . As will now be detailed below, the gradient in equation (3) is sparse and easy 
to compute for both cost functions J0) and J 2 (Y). 

For notational convenience, define A {k) , 1 < k < K, elementwise by: 
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4 k) = dZ u /dd 



where 



Z=U Q YU T 6 



Expanding U e and using the product rule of differentiation gives 



A (k) = Gfi 2 ■ ■ • G k _ x H k G M ■ ■ ■ G K YG' K ■ ■ ■ G^ + 



T r~iT/~iT 



Gfi 2 • • ■ G K YG J K ■ • • G' k ^H l k G k _ x ■ ■ • G T 2 Gl 



B WT 



(4) 



10 where H k is the derivative with respect to 6 k of the Givens matrix G h i.e., H k is all zeros except for 



«... 



- sin0 k cosG k 
-cos0 k - sinG k 



Evaluating at 0 = 0 makes all the G t identity matrices. H k is sparse, giving 



15 



B 



(*) 



©=o 



Y Y 
-Y -Y 

%1 \2 



v 

-Y 



h 
h 



(5) 
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where only the nonzero entries, in the i k and j k rows, are shown. 

As another notational aid, define ^ ( *'° elementwise by = dA^ fd^ t . Calculation of 

A m is simple and similar to that of A m and thus is not further described herein. For additional 
details, see V.K. Goyal, "Beyond Traditional Transform Coding," Ph.D. Thesis, Univ. California, 
Berkeley, 1998, Published as Univ. California, Berkeley, Electron. Res. Lab. Memo. No. UCB/ERL 
M99/2, January 1999, which is incorporated by reference herein. 

The minimization of J x by deterministic descent will now be described. Using equation (5), 
the components of the gradient of J x can be expressed as 



UV k I* J l*J 



The partial derivative (6) generally depends on all of the components of 0, but evaluating at 0 = 0 
gives a very simple expression. Cancellation of like terms yields 



99, 



= 4Y (y -y) 

hJk \ JkJk J < 



(7) 



0 = 0 



Thus the gradient computation in equation (3) requires a constant number of operations per 
parameter, independent of N. It should be noted that since the Frobenius norm is unchanged by 

orthogonal similarity transformations, minimizing J, is equivalent to maximizing J[(Y) = J]" Y^ . 

Of course, J x + J x is a constant, so ^J x - - V J x , demonstrating an easy way to obtain equation 
(7) and that a gradient ascent of J x is equivalent to the above-described gradient descent. 
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The equilibrium points of a descent with respect to J x are those for which equation (7) is zero 
for all k. The desired, stable equilibria are obtained when all Y i j are zero. The following gives step 

size bounds for local convergence around these equilibria. 

Denote the eigenvalues of Xhy k l >X 2 > - A N . It can be shown that the gradient descent 
5 update algorithm described by equations (2), (3), and (7) converges locally to a diagonalizing 
transform if 



0< \i < 



2 max! X , 



X,) 2 | =12(1,-1 N f 



10 This convergence property of the illustrative gradient descent algorithm can be proven as 

follows. Because the interest is in local properties, it can be assumed without loss of generality that 
Y= diag([Aj A 2 - A N ]). The key to the proofis linearizing the autonomous, nonlinear, discrete-time 
dynamical system described by equations (2) and (3). More particularly, describe the system as 

which upon linearization about the desired equilibrium 0 gives 

0 B+1 = (/-^)e„, where F y = 

20 Since / is continuously differentiate, a sufficient condition for local convergence is that the 
eigenvalues of I- juF lie inside the unit circle. See M. Vidyasagar, "Nonlinear Systems Analysis," 
Prentice-Hall, Englewood Cliffs, NJ, 1978, which is incorporated by reference herein. Evaluation 
of F then proceeds as follows. Differentiating equation (6) gives 
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d 2 J x {u Q YU T & ) 



(8) 



F is obtained by evaluating (8) at 0 = 0. Z reduces to Y, which is diagonal, so the first term makes 
no contribution. Simplifying the second term using equations (4) and (5) gives 



d 2 J x {u Q YUl) 



0=0 



(9) 



0 



otherwise 



The eigenvalues of I - juF are 1 - 4|x(^ - X h j . The proof is completed by requiring that these 

eigenvalues all lie inside the unit circle. 
1 0 The minimization of J 2 by deterministic descent will now be described. The gradient of J 2 

is given elementwise by 



dJ 2 (U Q YUl) , 1 

0X} k m=l V mm 



(11) 



1 5 To evaluate at 0 = 0, first note that 



A 



(*) 

mm 



0=0 



= 25 



(*) 



0=0 



2I^„ Xm=i k , 

- 2 \ m tim=jk> 
0 otherwise. 
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Thus only the m = i k and m =j k terms of the sum contribute, and 
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dJ 2 {u & YUl) 



hJk \ JkJk h l k 



39, 



0=0 



V ,=i J 



Y Y 

l k l k JkJk 



(12) 



5 The derivative (12) differs from that of (7) only by a factor of T x Y'^ Y\ ^ Y n . Thus ? the 

equilibria are the same except that any Y n = 0 gives an equilibrium. The occurrence of Y u - 0 is a 
degenerate condition that would be accompanied by a row and column of zeros. It can only occur 
if Xhas a zero eigenvalue, meaning that the source vectors lie in a proper subspace with probability 
one. As in the case of minimization of J x by deterministic gradient descent, one can also prove a 
10 local convergence result for the desired equilibria in the above-described minimization of J 2 , as 
follows. 

Denote the eigenvalues of Xby X { > k 2 > - > X N . The gradient descent described by 
equations (2), (3), and (12) converges locally to a diagonalizing transform if 
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0 < jLl < 



max 

1 j 



x i x j 



1-1 



N-l 



(*.-**) a IK 



i=2 



This convergence result can again be shown by linearizing the autonomous, nonlinear, 
discrete-time dynamical system implicitly defined. The result is summarized as 

otherwise. 



" d 2 J 2 






39,36, 




— i 

0=0 
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Requiring the eigenvalues of / - \xF to lie inside the unit circle completes the proof. 

Stochastic descents will now be described for the two example cost functions J x and J v For 
the Karhunen-Loeve filtering of a random signal, it is generally desirable to provide stochastic 
versions of the deterministic gradient descent algorithms previously described. More particularly, 
5 one would like to compute an update vector based on a single data vector such that the expected 
value of the update is the same as the update computed in the deterministic algorithm. This gives 
a stochastic algorithm that converges in mean under the step size conditions of the corresponding 
deterministic algorithm. 

In well-known heuristic analyses of the LMS algorithm, e.g., from the above-cited reference 
10 P.M. Clarkson, "Optimal and Adaptive Signal Processing," CRC Press, Boca Raton, FL, 1993, an 
"independence assumption" is used to simplify computations. Analyses based on the independence 
assumption are qualitatively accurate. One of the analyses below uses a similar independence 
assumption. 

The minimization of J x by stochastic gradient descent will now be described. This process 
15 is referred to herein as Algorithm 1. To obtain an update following equation (7), let 

0, = -^f^k-^), (13) 
where the time index n has been suppressed. For Gaussian data, 

20 

E [h]--^Y lkJk {Y hJk -Y iki ). (14) 

Therefore, the expected parameter vector changes in an algorithm using equations (2) and (13) are 
the same as those in the deterministic iteration of equations (2) and (3) with (7). The factor of 3 
25 difference between equations (13) and (14) is specific to a Gaussian source. For a non-Gaussian 
source, one can absorb this factor into the step size selection. 
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Consider the adaptive filtering of a Gaussian signal with covariance matrixXusing equations 
(2) and (13). It can be shown that the mean of the transform, in Givens parameterized form, 
converges locally under the conditions given in the above description of the corresponding 
deterministic descent. 

5 Notice that in equation (13) the 6 k parameter depends only on the i k and j k components ofy. 

This is desirable because it allows a parallelization of computations and a regularity of data flow, 
as is illustrated below in conjunction with FIG. 2. 

The minimization of J 2 by stochastic gradient descent will now be described. This process 
is also referred to herein as Algorithm 2. In order to obtain such an algorithm, following equation 
10 (12), one possible choice is 




= -n 



Jk 



n 



, y ^y jt 



{yl-yl)- 



(15) 



If the source is Gaussian and, for the purpose of examining the kth parameter, it is assumed 
15 that y t and v are independent of the remaining components of y, 



*K]=-i 



n 



2Y 



ikJk 



(r -r ) 



(16) 



Consider the adaptive filtering of a Gaussian signal with covariance matrixXusing equations 
20 (2) and (15). Using the independence assumption that yields equation (16), it can be shown that the 
mean of the transform, in Givens parameterized form, converges locally under the conditions given 
previously in conjunction with the corresponding deterministic descent. 
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With the exception of the [j " 1 y] factor, which may be incorporated in an update calculation 

block 100 such as that shown in FIG. 2, the update rule (15) shares the desirable regularity and 
parallelizability features of the previously-described Algorithm 1 . 

FIG. 2 shows a schematic representation of signal processing device which implements a 
5 stochastic gradient update in a Givens parameterized form. The "V" blocks represent gradient 
computations based on equations (13) or (15). It should be noted that all data dependencies are not 
shown for equation (15) in this particular representation. The update calculation block 100 is a 
scaling of the gradient by step size //, and the multiplication of Givens parameterized matrices may 
be implemented using conventional techniques, such as those described in, e.g., F J. Vanpoucke, et 
10 al., "Factored Orthogonal Transformations For Recursive Eigendecomposition," IEEE Trans. 
Circuits Syst-II: Anal. Dig. Sig. Proc, Vol. 44, No. 3, pp. 253-256, March 1997, which is 
incorporated by reference herein. 

Pairwise Energy Compaction Algorithm (Algorithm 3) 

1 5 Algorithms 1 and 2 in the previous section are based on a global property. More specifically, 

they are based on the diagonalizing property of a KLT; i.e., that the output of a KLT has a diagonal 
covariance matrix. In this section a different property of the KLT, i.e., maximal energy compaction, 
is used to derive a gradient descent algorithm illustrative of the present invention. 

With the conventional sorting of basis vectors, the output of a KLT satisfies the following 

20 property of maximizing the energy in the lowest-numbered components: 

m m 

^Y„= max Y Y n , for m=\,2,...,N. 

j_ j orthogonal T j_ j 

See, e.g., K.R. Rao, "Discrete Cosine Transform: Algorithms, Advantages, Applications," Academic 
25 Press, 1990, which is incorporated by reference herein. This property will be utilized in the 
following equivalent, but less common form 
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Xy„= min JX for m=l 9 2 9 ... 9 N. (17) 

/=m orthogonal T l=m 



This form uses gradient descent instead of the gradient ascent of the previous form. Because of the 
essential uniqueness of the KLT, the energy packing property can be indirectly captured by the 
5 global cost functions J x and J 2 . However, this section will not be concerned with global behavior, 
but will instead determine how the energy packing property manifests itself between pairs of 
transform coefficients. Having local optimality in each pair gives the desired global property as a 
byproduct. 

Referring again to the schematic representation of FIG. 1, if the transform is a KLT, thenjy 4 
1 0 has minimum possible power, y 3 has minimum power given that y 4 is fixed, etc. This suggests that 
a KLT can be obtained by adjusting each Givens rotation in FIG. 1 to minimize the power sent to 
the lower branch. As will be demonstrated below, this approach in fact leads to a locally convergent 
adaptation algorithm. 

Regardless of the size of the overall transform, the operation of rotation block G k is described 

15 by 











X" 




Jk _ 





cos 0 k sin 0 k 
-sin0 . cos 0 . 























j^ _ 





x' cos 0 , -f x[ sin 0 . 

l k K Jk K 

-jc,' sin0 . + x\ cos0, 

l k K Jk K 



20 



where x' and x" denote the input and output, respectively, of that transform stage. Now since 

8 ( x l) 2 dx 'i i \ 

\y = 2x" — - = 2x" - x\ cos9 k - x' sine k = - 2x" x[' , (18) 

QQ Jk Jk\ h k Jk k / Jk <i ' v ' 
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increasing Q k by 2 fix" x" is an instantaneous gradient descent of the energy in the lower branch. 



This gradient is very convenient because it is simply the product of quantities that are computed in 
the transform itself. 

To ascertain the convergence in mean of the new iteration described above, the expected 
update 0^ derived from equation (18), is used: 









= E 


2(1 



(< cosB k + x' h sine k ){- x' h sine k + x' h cas8 k ) 



X 



sin20 k + E 



x[ x\ 

l k Jk 



cos20 



(19) 



10 Since x' and x' depend on all earlier rotation stages, the components of 0 are coupled in a 

complicated way. However, using the approach of the previous section, the descent can be used to 
compute a multiplicative update, as illustrated in FIG. 2. The resulting update algorithm is referred 
to herein as Algorithm 3. The convergence in mean then depends only on the behavior of the 
expected gradient (19) around 0. 
15 Denote the eigenvalues of Xby X x > k 2 > - > A, N . It can be shown that the deterministic 

iteration described by equation (2) with 



hJk 



20 converges locally to a diagonalizing transform if 
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0 < (j, < 



max^., - X^j 



-i 



X t -X N 



Under the same step size bounds, it can be shown that the stochastic iteration described by equation 
(2) with 

5 

Q k = \i2y lk y Jk (20) 



gives a transform sequence that, in Givens parameterized form, converges locally. 

As in the proofs of convergence given for the deterministic descents described previously, 
10 the expected value of the parameter vector is described by a discrete-time dynamical system. 
Linearizing about the desired equilibrium, the stability of 



Q„ +1 = (l+/iF)Q„ where F tJ 



J_ 
39. 



is established. This yields for the diagonal elements of F, 
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d% k E 








0=0 




Jk 



cos20 k -4£ 



x\ x' 

l k Jk 



sin20 t ]| Q . 



= 2 \E 



'x 2 ' 


-E 


'x 2 ' 


Jk m 




>k m 



(21) 



and the off-diagonal elements are 
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d[E 





-E 


*; 2 ' 


Jk 




k 



dE 



ae, 



0=0 



sin29^ + 2- 

^0 as 0^0 



X X' 

l k Jk 



cos29 , 



= 0 . (22) 



-»0as©->0 



0=0 



The eigenvalues of F are2(^ - A,, J, k = \,2,...,K\ thus the eigenvalues of / + juF 

are 1 + M^ Jk ~ K k ) • When all the eigenvalues lie in the unit circle, the dynamical system converges 
5 locally to the desired equilibrium. 

Simulations and Comparisons 

The mean convergence rates of the above-described illustrative update algorithms will now 
be examined in greater detail. The linearizations used in the above convergence proofs facilitate 
10 estimation of rates of convergence. Consider Algorithm 1, the stochastic gradient descent with 
respect to J v In the vicinity of the desired equilibrium, the error in 6 k at the nth iteration is given 
approximately by 



15 



based on equation (9). For local stability in this example, the bracketed quantity is required to be 
in the interval (-1,1) for all k. Furthermore, the bracketed quantity should be as close to zero as 
possible, in order to provide fast convergence. The ability to do this through the choice of// is 
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limited by the variability of lk - 1 h j . Accordingly, one can define pseudo-eigenvalue spreads 



for the three illustrative algorithms as follows: 



s,(X)= ■") '' s 2 (X)= and ,,(*). ) . 



' <y nnn 

kj A ,A 



These pseudo-eigenvalue spreads may be viewed as analogous to the eigenvalue spread in LMS 
adaptive filtering as described in the above-cited P.M. Clarkson reference, "Optimal and Adaptive 
Signal Processing/' CRC Press, Boca Raton, FL, 1993. 

Notice that a pseudo-eigenvalue spread is not less than one and that s x (X) = (s 3 (X)f zs 3 (X). 
This suggests that with appropriate step size choices, Algorithm 3 will have faster local convergence 
than Algorithm 1 for any signal. The difference between s^X) and s 2 (X) suggests that the superior 
choice between Algorithms 1 and 2 will depend onXalong with the choice of//. These observations 
are confirmed through the following example. 

Consider the matrices 



f 

X l = diag 

for which s^X,) = 16 < 28 = s 2 (x x ) and s,(x 2 ) = 49 > y = s 2 (x 2 ) . Based on these pseudo-eigenvalue 

spreads, it is expected that the Algorithm 1 global descent with respect to J x will perform better than 
the Algorithm 2 global descent with respect to J 2 for matrix X l9 and vice versa for matrix X 2 . For 
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Algorithm 3, s 3 (X x ) = 4 and s 3 (X 2 ) = 7. These are the smallest spreads, so it is expected that 
Algorithm 3 will have the fastest convergence for both of the matrices in this example. 

FIGS. 3 A and 3B show deterministic simulation results for each of the three algorithms for 
the respective X x andZ 2 matrices of the above example. Algorithm 1 is the global gradient descent 
5 with respect to J x as in equation (13), Algorithm 2 is the global descent with respect to J 2 as in 
equation (15), and Algorithm 3 is the pairwise energy compaction descent in accordance with 
equation (20). The simulation results indicate that the performance is as predicted. In the 
simulations, each algorithm was used with the step size \i that maximized the speed of convergence 
of the slowest mode, as given in Table 1 below. Since the signal statistics are not known in advance, 
10 the step size selection will generally not be optimal. However, the pseudo-eigenvalue spread 
provides a bound on performance. 







X 2 


Algorithm 1 


32/17 


16/25 


Algorithm 2 


1024/145 


512/53 


Algorithm 3 


8/5 


1 



Table 1: Step Sizes Used in Simulations of Mean Convergence 



20 The simulation results are averaged over 1 00 randomly chosen initial transforms. Each curve 

in FIGS. 3 A and 3B thus represents the average of 100 simulations with randomly chosen initial 
transforms corresponding to 0 uniformly distributed on [-tc/4, %/4). Convergence could also be 
measured by J x or J 2 , and results of this type are given in the above-cited Ph.D. thesis of V.K. Goyal. 
At each iteration, the error in the transform is measured by the Frobenius norm of the difference 

25 between a true KLT, i.e., an identity matrix, and the adapted transform. Since the interest is in 
finding any KLT, this approach uses the minimum norm over all row permutations and 
multiplications of rows by ±1 : 
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min 

permutationsP 
diagonal os.t.G 2 = I 



Stochastic simulations have also been performed on the three illustrative update algorithms. 
In the stochastic setting, e.g., using equations (13), (15) or (20), transform updates are on average 
in the correct direction, but may have high variance. The step size should not actually be chosen 
5 close to the previously-described predicted bound for stability. The simulations described below in 
conjunction with FIGS. 4A, 4B and 4C indicate that a good choice is one or two orders of magnitude 
less than the bound. 

Consider by way of example a zero-mean Gaussian source with covariance matrix 



1 0.6 0.36 0.218 

0.6 1 0.6 0.36 

0.36 0.6 1 0.6 

0.218 0.36 0.6 1 



FIGS. 4 A, 4B and 4C show stochastic simulation results for Algorithm 1, Algorithm 2 and 
Algorithm 3, respectively, using the example four-dimensional source X given above. The source 
is Gaussian with E[xpc^[ = 0.6 M . In all simulations the initial transform was the identity transform. 
After finding the eigenvalues of X, predicted bounds for stability 

15 |i J * 0.1 1 3 1, \x * * 0.5 864, and ji * « 0.4757 were computed for the three algorithms, respectively. 

The simulations used \x = yfly for various values of y, where ji* is the maximum step size for 
convergence in mean. The resulting curves are labeled by y in the figures. Each curve represents 
the average of 100 simulations. 

Based on intuition from LMS filtering, one would expect a trade-off between speed of 
20 convergence and steady-state error. The results shown in FIGS. 4 A, 4B and 4C confirm this 
intuition. More particularly, it can be seen from the curves that with a large step size (small y), 
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steady state is approached quickly, but the steady-state error is large, while with smaller step sizes 
(larger y), steady-state error is smaller but convergence is slower. In each of FIGS. 4A, 4B and 4C, 
the lower hull of the curves gives the minimum error for a given number of iterations, assuming a 
fixed step size. This lower hull is lowest for Algorithm 3. The superiority of this algorithm is also 
5 predicted by comparing pseudo-eigenvalue spreads. 

Implementation for Backward Adaptive Transform Coding 

An implementation of the invention for backward adaptive transform coding will now be 
described. In adaptive transform coding, adaptation of the transform in the encoder must be 

1 0 accompanied by corresponding adaptation in the decoder. With a typical forward adaptive structure, 
changes to the transform must be encoded and sent along with the encoded data. An alternative is 
to base the transform updates only on data available at the decoder. In this case, the decoder can 
deduce the correct decoding transform by running the same adaptation algorithm as the encoder. 
This is analogous to the use of backward adaptation of prediction filters and quantizer scaling in 

15 adaptive differential pulse-coded modulation (ADPCM), as described in, e.g., R.V. Cox, "Speech 
Coding," The Digital Signal Processing Handbook, Chapter 45, CRC and IEEE Press, pp. 45.1- 
45.19, 1998, which is incorporated by reference herein. 

Of course, quantization is an irreversible reduction in information. For a given accuracy, 
estimation of the source covariance requires more samples when using quantized data. Estimating 

20 moments of the source by moments of the quantized data introduces a bias, which for a Gaussian 
source is described in, e.g., L. Cheded, et al., "The Exact Impact of Amplitude Quantization on 
Multi-Dimensional, High-Order Moments Estimation," Signal Proc, Vol. 39, No. 3, pp. 293-315, 
September 1994, which is incorporated by reference herein. For uncorrelated components, the bias 
disappears from cross moment estimates, such that the bias from quantization has no effect at 

25 convergence. The primary effect of quantization is to increase the steady-state error when the 
quantization is very coarse. Quantization may speed the initial stages of convergence by amplifying 
small sample values. 

FIG. 5 shows a signal processing device which includes an encoding and adaptation structure 
for backward adaptive transform coding in accordance with the invention. Each of the elements 
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labeled Q represents a quantizer, but the structure is otherwise similar to that of the signal processing 
device of FIG. 2. 

When the transform adaptation is driven exclusively by quantized data that will be 
transmitted to a decoder, it is not necessary to separately encode the transform update. Instead, by 
5 using the same adaptation rules, the decoder can track the transform used in the encoder. 

FIG. 6 shows the results of simulations of a backward adaptive version of Algorithm 3, using 
the same example four-dimensional source X as the previous example. The source in these 
simulations is thus Gaussian with E[xpc^\ = 0.6 M . The initial transform is again the identity 
transform, and the step size is again ji = jiVy, where |w*is the maximum step size for convergence 

10 in mean. Based on the results depicted in FIG. 4C, the step size \i = ji*/20 ^0.02378 was chosen, 
i.e., y = 20. The quantization is unbounded and uniform with quantization step size A. The curves 
in FIG. 6 correspond to different values of A, the step size of the uniform scalar quantizer used in 
the encoder, and a curve for the no quantization case is included for comparison purposes. Each 
curve represents the average of 50 simulations. 

1 5 It can be seen from the curves of FIG. 6 that for a reasonable range of quantization step sizes, 

the use of quantized data increases the steady-state error slightly and has little effect on the rate of 
convergence. Several other simulations of this type are presented in the above-cited Ph.D. thesis of 
V.K. Goyal, including cases where the source is time- varying. 

The above-described embodiments of the invention are intended to be illustrative only. 

20 Alternative embodiments of the invention may utilize other signal processing devices, with different 
arrangements and configurations of elements. For example, the invention may utilize reduced- 
parameter forms other than the Givens parameterized form described herein, such as a Householder 
form. The techniques of the invention are also applicable to any desired type of source signal(s). 
Moreover, the invention may be used for a wide variety of different types of signal processing 

25 applications other than those described herein. Furthermore, the above-described processing 
operations of the invention may be implemented in whole or in part using one or more software 
programs. These and numerous other alternative embodiments within the scope of the following 
claims will be apparent to those skilled in the art. 
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Claims 

What is claimed is: 

1 . A signal processing method comprising the steps of: 

processing a signal in a signal processing device configured to implement a transform 
for producing a desired transformed output signal; and 

updating the transform during the processing step based on received data associated 
with the signal being processed, so as to track a basis associated with the transform; 

wherein the transform is represented in a reduced-parameter form and the updating 
step is implemented using computations involving the reduced-parameter form. 

2. The method of claim 1 wherein the transform comprises a Karhunen-Loeve transform. 

3. The method of claim 1 wherein the reduced-parameter form for an N x N transform 
comprises fewer than N 2 parameters. 

4. The method of claim 1 wherein an adaptation of the transform is represented directly as 
one or more changes in the reduced-parameter form. 



5. The method of claim 1 wherein the reduced-parameter form comprises a Givens 
20 parameterized form. 

6. The method of claim 5 wherein the updating step utilizes multiplications of Givens 
parameterized matrices computed in parametric form. 

25 7. The method of claim 1 wherein the reduced-parameter form comprises a Householder 

form. 



8. The method of claim 1 wherein the updating step avoids the need for an explicit 
eigendecomposition operation in implementing the transform. 
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9. The method of claim 1 wherein the updating step makes adjustments in the transform so 
as to minimize a negative gradient of a pairwise energy compaction property of the transform. 

1 0. The method of claim 9 wherein the negative gradient minimization is locally convergent 
5 in mean for a specified step size. 

11. The method of claim 9 wherein the adjustment for a Mi parameter of the transform 
associated with a particular one ofa plurality ofGivens rotations is given by 8 k = ^y h y h , where 

^ is the step size of the gradient algorithm, y x and y } are designated pairs of elements of a matrix Y 
10 = TXT T , T is a matrix representing the transform, and X is a matrix representing elements of the 
signal being processed. 

12. The method of claim 1 wherein the transform comprises a backward adaptive transform 
and the updating step is driven by quantized data. 

15 

13. An apparatus comprising: 

a signal processing device configured to implement a transform for processing a 
signal so as to produce a desired transformed output signal, the device further being operative to 
implement a process for updating the transform while processing the signal, in accordance with 
20 received data associated with the signal, wherein the transform is represented in a reduced-parameter 
form and the updating process is implemented using computations involving the reduced-parameter 
form. 

14. The apparatus of claim 13 wherein the transform comprises a Karhunen-Loeve 
25 transform. 

1 5 . The apparatus of claim 1 3 wherein the reduced-parameter form for an N x TV transform 
comprises fewer than TV 2 parameters. 
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16. The apparatus of claim 13 wherein an adaptation of the transform is represented directly 
as one or more changes in the reduced-parameter form. 

17. The apparatus of claim 13 wherein the reduced-parameter form comprises a Givens 
5 parameterized form. 

18. The method of claim 17 wherein the updating process utilizes multiplications of Givens 
parameterized matrices computed in parametric form. 

10 1 9. The apparatus of claim 1 3 wherein the reduced-parameter form comprises a Householder 

form. 

20. The apparatus of claim 13 wherein the updating process avoids the need for an explicit 
eigendecomposition operation in implementing the transform. 

15 

21. The apparatus of claim 13 wherein the updating process makes adjustments in the 
transform so as to minimize a negative gradient of a pairwise energy compaction property of the 
transform. 

20 22. The apparatus of claim 21 wherein the negative gradient minimization is locally 

convergent in mean for a specified step size. 

23. The apparatus of claim 21 wherein the adjustment for a Mi parameter of the transform 
associated with a particular one of a plurality of Givens rotations is given by 6 k = \i 2y lt y Jk , where 

25 u is the step size of the gradient algorithm, y, and y } are designated pairs of elements of a matrix Y 
= TXT T , T is a matrix representing the transform, and X is a matrix representing elements of the 
signal being processed. 



30 



V.K. Goyal 5 

24. The apparatus of claim 13 wherein the transform comprises a backward adaptive 
transform and the updating step is driven by quantized data. 

25. A machine-readable medium for storing one or more software programs for use in 
5 processing a signal in a signal processing device configured to implement a transform for producing 

a desired transformed output signal, the one or more software programs when executed 
implementing the step of: 

updating the transform based on received data associated with the signal being 
processed, so as to track a basis associated with the transform; 
1 o wherein the transform is represented in a reduced-parameter form and the updating 

step is implemented using computations involving the reduced-parameter form. 
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Abstract 

A signal processing device utilizes a stochastic approximation of a gradient descent algorithm 
for updating a transform. The signal processing device is configured to implement the transform for 
producing a desired transformed output signal, and the transform is updated using the stochastic 

5 approximation of the gradient algorithm based on received data associated with the signal being 
processed. The transform is represented in a reduced-parameter form, such as a Givens 
parameterized form or a Householder form, such that the reduced-parameter form for an N x N 
transform comprises fewer than AT 2 parameters. The updating process is implemented using 
computations involving the reduced-parameter form, and an adaptation of the transform is 

10 represented directly as one or more changes in the reduced-parameter form. The gradient algorithm 
may be configured to minimize a negative gradient of a pairwise energy compaction property of the 
transform. Advantageously, the gradient algorithm may be made locally convergent in mean for a 
specified step size. The invention can also be implemented in a backward adaptive form in which 
the updating process is driven by quantized data. 
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IN THE UNITED STATES 
PATENT AND TRADEMARK OFFICE 

Declaration and Power of Attorney 

As the below named inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated below next to my name. 

I believe I am the original, first and sole inventor of the subject matter which is claimed 
and for which apatent is sought on the invention entitled METHODS AND APPARATUS FOR 
ADAPTIVE SIGNAL PROCESSING INVOLVING A KARHUNEN-LOEVE BASIS the 
specification of which is attached hereto. 

I hereby state that I have reviewed and understand the contents of the above identified 
specification, including the claims, as amended by an amendment, if any, specifically referred to 
in this oath or declaration. 

I acknowledge the duty to disclose all information known to me which is material to 
patentability as defined in Title 37, Code of Federal Regulations, 1.56. 

I hereby claim foreign priority benefits under Title 35, United States Code, 119 of any 
foreign application(s) for patent or inventor's certificate listed below and have also identified 
below any foreign application for patent or inventor's certificate having a filing date before that 
of the application on which priority is claimed: 

None 

I hereby claim the benefit under Title 35, United States Code, 120 of any United States 
application(s) listed below and, insofar as the subject matter of each of the claims of this 
application is not disclosed in the prior United States application in the manner provided by the 
first paragraph of Title 35, United States Code, 112, I acknowledge the duty to disclose all 
information known to me to be material to patentability as defined in Title 37, Code of Federal 
Regulations, 1.56 which became available between the filing date of the prior application and the 
national or PCT international filing date of this application: 

None 

I hereby declare that all statements made herein of my own knowledge are true and that 
all statements made on information and belief are believed to be true; and further that these 
statements were made with the knowledge that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under Section 1001 of Title 18 of the United States 
Code and that such willful false statements may jeopardize the validity of the application or any 
patent issued thereon. 
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I hereby appoint the following attomey(s) with full power of substitution and revocation, 
to prosecute said application, to make alterations and amendments therein, to receive the patent, 
and to transact all business in the Patent and Trademark Office connected therewith: 
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John P. Veschi 



(Reg. No. 39058) 
(Reg. No. 29355) 
(Reg. No. 27407) 
(Reg. No. 36304) 
(Reg. No. 17765) 



I hereby appoint the attorney(s) on ATTACHMENT A as associate attorney(s) in the 
aforementioned application, with full power solely to prosecute said application, to make 
alterations and amendments therein, to receive the patent, and to transact all business in the Patent 
and Trademark Office connected with the prosecution of said application. No other powers are 
granted to such associate attorney(s) and such associate attorney(s) are specifically denied any 
power of substitution or revocation. 



Full name of sole inventor: Vivek K. Goyal 



Inventor's signature v/'ftt /JfA 

Residence: Hoboken, Hudson County, New Jersey 




Date TttM 7, l990 



Citizenship: United States of America 



Post Office Address: 



156 7th Street, Apt B 
Hoboken, New Jersey 07030 
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ATTACHMENT A 



Attorney Name(s): Joseph B. Ryan Reg. No. 37922 
Kevin M. Mason Reg. No. 36597 
William E. Lewis Reg. No. 39274 



Telephone calls should be made to Joseph B. Ryan of Ryan & Mason, L.L.P. at: 

Phone No.: (516)759-7517 
Fax No.: (516)759-9512 

All written communications are to be addressed to: 

Ryan & Mason, L.L.P. 

90 Forest Avenue 

Locust Valley, New York 1 1560 



